7,535 research outputs found
Distinguishing regional from within-codon rate heterogeneity in DNA sequence alignments
We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments
Efficient FPT algorithms for (strict) compatibility of unrooted phylogenetic trees
In phylogenetics, a central problem is to infer the evolutionary
relationships between a set of species ; these relationships are often
depicted via a phylogenetic tree -- a tree having its leaves univocally labeled
by elements of and without degree-2 nodes -- called the "species tree". One
common approach for reconstructing a species tree consists in first
constructing several phylogenetic trees from primary data (e.g. DNA sequences
originating from some species in ), and then constructing a single
phylogenetic tree maximizing the "concordance" with the input trees. The
so-obtained tree is our estimation of the species tree and, when the input
trees are defined on overlapping -- but not identical -- sets of labels, is
called "supertree". In this paper, we focus on two problems that are central
when combining phylogenetic trees into a supertree: the compatibility and the
strict compatibility problems for unrooted phylogenetic trees. These problems
are strongly related, respectively, to the notions of "containing as a minor"
and "containing as a topological minor" in the graph community. Both problems
are known to be fixed-parameter tractable in the number of input trees , by
using their expressibility in Monadic Second Order Logic and a reduction to
graphs of bounded treewidth. Motivated by the fact that the dependency on
of these algorithms is prohibitively large, we give the first explicit dynamic
programming algorithms for solving these problems, both running in time
, where is the total size of the input.Comment: 18 pages, 1 figur
Preservation of information in a prebiotic package model
The coexistence between different informational molecules has been the
preferred mode to circumvent the limitation posed by imperfect replication on
the amount of information stored by each of these molecules. Here we reexamine
a classic package model in which distinct information carriers or templates are
forced to coexist within vesicles, which in turn can proliferate freely through
binary division. The combined dynamics of vesicles and templates is described
by a multitype branching process which allows us to write equations for the
average number of the different types of vesicles as well as for their
extinction probabilities. The threshold phenomenon associated to the extinction
of the vesicle population is studied quantitatively using finite-size scaling
techniques. We conclude that the resultant coexistence is too frail in the
presence of parasites and so confinement of templates in vesicles without an
explicit mechanism of cooperation does not resolve the information crisis of
prebiotic evolution.Comment: 9 pages, 8 figures, accepted version, to be published in PR
Lassoing and corraling rooted phylogenetic trees
The construction of a dendogram on a set of individuals is a key component of
a genomewide association study. However even with modern sequencing
technologies the distances on the individuals required for the construction of
such a structure may not always be reliable making it tempting to exclude them
from an analysis. This, in turn, results in an input set for dendogram
construction that consists of only partial distance information which raises
the following fundamental question. For what subset of its leaf set can we
reconstruct uniquely the dendogram from the distances that it induces on that
subset. By formalizing a dendogram in terms of an edge-weighted, rooted
phylogenetic tree on a pre-given finite set X with |X|>2 whose edge-weighting
is equidistant and a set of partial distances on X in terms of a set L of
2-subsets of X, we investigate this problem in terms of when such a tree is
lassoed, that is, uniquely determined by the elements in L. For this we
consider four different formalizations of the idea of "uniquely determining"
giving rise to four distinct types of lassos. We present characterizations for
all of them in terms of the child-edge graphs of the interior vertices of such
a tree. Our characterizations imply in particular that in case the tree in
question is binary then all four types of lasso must coincide
Nocardia kroppenstedtii sp. nov., a novel actinomycete isolated from a lung transplant patient with a pulmonary infection
An actinomycete, strain N1286T, isolated from a lung transplant patient with a pulmonary infection, was provisionally assigned to the genus Nocardia. The strain had chemotaxonomic and morphological properties typical of members of the genus Nocardia and formed a distinct phyletic line in the Nocardia 16S rRNA gene tree. It was most closely related to Nocardia farcinica DSM 43665T (99.8% gene similarity) but was distinguished from the latter by a low level of DNA:DNA relatedness. These strains were also distinguished by a broad range of phenotypic properties. On the basis of these data, it is proposed that isolate N1286T (=DSM 45810T = NCTC 13617T) should be classified as the type strain of a new Nocardia species for which the name Nocardia kroppenstedtii is proposed
EVOLUTION FOR BIOINFORMATICIANS AND BIOINFORMATICS FOR EVOLUTIONISTS 1
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/72826/1/j.0014-3820.2005.tb00937.x.pd
An Alternative Model of Amino Acid Replacement
The observed correlations between pairs of homologous protein sequences are
typically explained in terms of a Markovian dynamic of amino acid substitution.
This model assumes that every location on the protein sequence has the same
background distribution of amino acids, an assumption that is incompatible with
the observed heterogeneity of protein amino acid profiles and with the success
of profile multiple sequence alignment. We propose an alternative model of
amino acid replacement during protein evolution based upon the assumption that
the variation of the amino acid background distribution from one residue to the
next is sufficient to explain the observed sequence correlations of homologs.
The resulting dynamical model of independent replacements drawn from
heterogeneous backgrounds is simple and consistent, and provides a unified
homology match score for sequence-sequence, sequence-profile and
profile-profile alignment.Comment: Minor improvements. Added figure and reference
Circular Networks from Distorted Metrics
Trees have long been used as a graphical representation of species
relationships. However complex evolutionary events, such as genetic
reassortments or hybrid speciations which occur commonly in viruses, bacteria
and plants, do not fit into this elementary framework. Alternatively, various
network representations have been developed. Circular networks are a natural
generalization of leaf-labeled trees interpreted as split systems, that is,
collections of bipartitions over leaf labels corresponding to current species.
Although such networks do not explicitly model specific evolutionary events of
interest, their straightforward visualization and fast reconstruction have made
them a popular exploratory tool to detect network-like evolution in genetic
datasets.
Standard reconstruction methods for circular networks, such as Neighbor-Net,
rely on an associated metric on the species set. Such a metric is first
estimated from DNA sequences, which leads to a key difficulty: distantly
related sequences produce statistically unreliable estimates. This is
problematic for Neighbor-Net as it is based on the popular tree reconstruction
method Neighbor-Joining, whose sensitivity to distance estimation errors is
well established theoretically. In the tree case, more robust reconstruction
methods have been developed using the notion of a distorted metric, which
captures the dependence of the error in the distance through a radius of
accuracy. Here we design the first circular network reconstruction method based
on distorted metrics. Our method is computationally efficient. Moreover, the
analysis of its radius of accuracy highlights the important role played by the
maximum incompatibility, a measure of the extent to which the network differs
from a tree.Comment: Submitte
Phylogenetic inference under recombination using Bayesian stochastic topology selection
Motivation: Conventional phylogenetic analysis for characterizing the relatedness between taxa typically assumes that a single relationship exists between species at every site along the genome. This assumption fails to take into account recombination which is a fundamental process for generating diversity and can lead to spurious results. Recombination induces a localized phylogenetic structure which may vary along the genome. Here, we generalize a hidden Markov model (HMM) to infer changes in phylogeny along multiple sequence alignments while accounting for rate heterogeneity; the hidden states refer to the unobserved phylogenic topology underlying the relatedness at a genomic location. The dimensionality of the number of hidden states (topologies) and their structure are random (not known a priori) and are sampled using Markov chain Monte Carlo algorithms. The HMM structure allows us to analytically integrate out over all possible changepoints in topologies as well as all the unknown branch lengths
- …